Plasma proteome profiling discovers novel proteins associated with non‐alcoholic fatty liver disease

Lili Niu, Philipp E Geyer, Nicolai J Wewer Albrechtsen, Lise L Gluud, Alberto Santos, Sophia Doll, Peter V Treit, Jens J Holst, Filip K Knop, Tina Vilsbøll, Anders Junker, Stephan Sachs, Kerstin Stemmer, Timo D Müller, Matthias H Tschöp, Susanna M Hofmann, Matthias Mann

image.png

Access article online

Abstract

Non‐alcoholic fatty liver disease (NAFLD) affects 25% of the population and can progress to cirrhosis with limited treatment options. As the liver secretes most of the blood plasma proteins, liver disease may affect the plasma proteome. Plasma proteome profiling of 48 patients with and without cirrhosis or NAFLD revealed six statistically significantly changing proteins (ALDOB, APOM, LGALS3BP, PIGR, VTN, and AFM), two of which are already linked to liver disease. Polymeric immunoglobulin receptor (PIGR) was significantly elevated in both cohorts by 170% in NAFLD and 298% in cirrhosis and was further validated in mouse models. Furthermore, a global correlation map of clinical and proteomic data strongly associated DPP4, ANPEP, TGFBI, PIGR, and APOE with NAFLD and cirrhosis. The prominent diabetic drug target DPP4 is an aminopeptidase like ANPEP, ENPEP, and LAP3, all of which are up‐regulated in the human or mouse data. Furthermore, ANPEP and TGFBI have potential roles in extracellular matrix remodeling in fibrosis. Thus, plasma proteome profiling can identify potential biomarkers and drug targets in liver disease.

Notebook

This notebook is a step by step guide on how to reproduce te analyses described in the Clinical Knowledge Graph article. The analyses described are performed in an automated manner following a sequence of steps defined in the configuration file (report_manager/config/proteomics.yml). Here, we use CKG’s API to show this analytical workflow.

[ ]:

[ ]:

[1]:
import os
import pandas as pd

import ckg.ckg_utils as ckg_utils

from ckg.analytics_core.analytics import analytics
from ckg.analytics_core.viz import viz

from ckg.report_manager import project, knowledge
from ckg.report_manager.dataset import  ProteomicsDataset, ClinicalDataset

from plotly.offline import download_plotlyjs, init_notebook_mode, iplot
%matplotlib inline
init_notebook_mode(connected=True)
C:\Users\sande\.conda\envs\ckgenv\lib\site-packages\outdated\utils.py:18: OutdatedPackageWarning: The package outdated is out of date. Your version is 0.2.0, the latest is 0.2.1.
Set the environment variable OUTDATED_IGNORE=1 to disable these warnings.
  **kwargs
C:\Users\sande\.conda\envs\ckgenv\lib\site-packages\outdated\utils.py:18: OutdatedPackageWarning: The package pingouin is out of date. Your version is 0.3.10, the latest is 0.3.11.
Set the environment variable OUTDATED_IGNORE=1 to disable these warnings.
  **kwargs
WGCNA functions will not work. Module Rpy2 not installed.
R functions will not work. Module Rpy2 not installed.
[2]:
analysis_dir = '../../../../data/tmp/Niu2019'
ckg_utils.checkDirectory(analysis_dir)

Load Data

This study is part of the datasets provided within CKG. The project data has already been loaded into CKG. The pipeline starts by building this project based on the types of datasets available (Clinical, Proteomic, etc.). We can access the data by creating a project object with the right project ifentifier and CKG will retrieve all the available data for us using the query_data() function.

[3]:
p = project.Project(identifier="P0000001", datasets={}, knowledge=None, report={}, configuration_files={})
project_info = p.query_data()
[4]:
project_info
[4]:
{'attributes':   acronym           data_types description identifier  \
 0   NAFLD  proteomics|clinical        None   P0000014

                                 name  number_subjects responsible status
 0  Non-alcoholic fatty liver disease               48        test   None  ,
 'similarity':                              current current_id  \
 0  Non-alcoholic fatty liver disease   P0000014
 1  Non-alcoholic fatty liver disease   P0000014
 2  Non-alcoholic fatty liver disease   P0000014
 3  Non-alcoholic fatty liver disease   P0000014
 4  Non-alcoholic fatty liver disease   P0000014
 5  Non-alcoholic fatty liver disease   P0000014

                                          description              other  \
 0  The altered molecular proteins and pathways in...  Covid-19 - plasma
 1                                               None           Melanoma
 2                                               None        QUOD Kidney
 3                                               None            DorDilu
 4                                               None    Melanoma-DIA-NN
 5                                               None         test mztab

    other_id responsible  similarity_pearson
 0  P0000017        test            0.100648
 1  P0000012        test           -0.025570
 2  P0000016        test           -0.039196
 3  P0000019        test           -0.064045
 4  P0000013        test           -0.070746
 5  P0000015        test           -0.233736  ,
 'overlap':         from  intersection                      project1_name  project1_total  \
 0   P0000013          2955                    Melanoma-DIA-NN            4764
 1   P0000013          3729                    Melanoma-DIA-NN            4764
 2   P0000016          3046                        QUOD Kidney            6638
 3   P0000012          2070                           Melanoma            3445
 4   P0000012          1666                           Melanoma            3445
 5   P0000012           914                           Melanoma            3445
 6   P0000012          1826                           Melanoma            3445
 7   P0000014           867  Non-alcoholic fatty liver disease            1927
 8   P0000017           729                  Covid-19 - plasma            3868
 9   P0000014           892  Non-alcoholic fatty liver disease            1927
 10  P0000014          1044  Non-alcoholic fatty liver disease            1927
 11  P0000013           767                    Melanoma-DIA-NN            4764
 12  P0000012           562                           Melanoma            3445
 13  P0000014           325  Non-alcoholic fatty liver disease            1927
 14  P0000016           690                        QUOD Kidney            6638
 15  P0000014             4  Non-alcoholic fatty liver disease            1927
 16  P0000015             3                         test mztab               4
 17  P0000012             4                           Melanoma            3445
 18  P0000015             4                         test mztab               4
 19  P0000013             4                    Melanoma-DIA-NN            4764
 20  P0000015             3                         test mztab               4

     project1_unique                      project2_name  project2_total  \
 0              1809                  Covid-19 - plasma            3868
 1              1035                        QUOD Kidney            6638
 2              3592                  Covid-19 - plasma            3868
 3              1375                    Melanoma-DIA-NN            4764
 4              1779                  Covid-19 - plasma            3868
 5              2531                            DorDilu            1531
 6              1619                        QUOD Kidney            6638
 7              1060                  Covid-19 - plasma            3868
 8              3139                            DorDilu            1531
 9              1035                    Melanoma-DIA-NN            4764
 10              883                        QUOD Kidney            6638
 11             3997                            DorDilu            1531
 12             2883  Non-alcoholic fatty liver disease            1927
 13             1602                            DorDilu            1531
 14             5948                            DorDilu            1531
 15             1923                         test mztab               4
 16                1                            DorDilu            1531
 17             3441                         test mztab               4
 18                0                  Covid-19 - plasma            3868
 19             4760                         test mztab               4
 20                1                        QUOD Kidney            6638

     project2_unique  similarity        to
 0               913    0.520521  P0000017
 1              2909    0.485990  P0000016
 2               822    0.408311  P0000017
 3              2694    0.337188  P0000013
 4              2202    0.295024  P0000017
 5               617    0.225012  P0000019
 6              4812    0.221146  P0000016
 7              3001    0.175933  P0000017
 8               802    0.156103  P0000019
 9              3872    0.153820  P0000013
 10             5594    0.138811  P0000016
 11              764    0.138748  P0000019
 12             1365    0.116840  P0000014
 13             1206    0.103734  P0000019
 14              841    0.092258  P0000019
 15                0    0.002076  P0000015
 16             1528    0.001958  P0000019
 17                0    0.001161  P0000015
 18             3864    0.001034  P0000017
 19                0    0.000840  P0000015
 20             6635    0.000452  P0000016  }

Here, project_info contains all the information (attributes) for the chosen project: name, acronym, description, etc, as well as overlap and similarity with other projects in CKG’s database. We save these attributes and project similarities in the project object p.

[5]:
p.set_attributes(project_info)
p.get_similar_projects(project_info)
p.get_projects_overlap(project_info)

Now it is time to create a dataset object for each data type in the project and store them as a dictionary of datasets in p. We create the datasets without specifying any configuration. When CKG runs in an automated manner, the configuration files in report_manager/config define how each dataset is analysed and which parameters should be used by default. Here, we will run the analyses of the protoemics data step by step.

[6]:
for data_type in p.data_types:
    dataset = None
    configuration = None
    if data_type == "proteomics":
        dataset = ProteomicsDataset(p.identifier, data={}, configuration=configuration, analysis_queries={}, report=None)
    elif data_type == "clinical":
        dataset = ClinicalDataset(p.identifier, data={}, configuration=configuration, analysis_queries={}, report=None)

    if dataset is not None:
        dataset.generate_dataset()
        p.update_dataset({data_type: dataset})

We can now see that our project p has two datasets - clinical and proteomics. These datasets will contain already serveral dataframes: - original: data as it was collected/generated, with no processing done.

  • processed: processed data after normalization, imputation, batch effect correction, etc.

  • specific dataframes: depending on the type of data extra dataframes are generated, for instance, a list of clinical variables or annotation from Gene Ontology or other databases

[7]:
p.list_datasets()
[7]:
dict_keys(['proteomics', 'clinical'])
[8]:
clinical_dataset = p.get_dataset('clinical')
clinical_dataset.list_dataframes()
[8]:
['clinical variables', 'original', 'processed']
[9]:
proteomics_dataset = p.get_dataset('proteomics')
proteomics_dataset.list_dataframes()
[9]:
['number of proteins',
 'number of peptides',
 'number of modified proteins',
 'protein biomarkers',
 'tissue qcmarkers',
 'metadata',
 'protein pathway annotation',
 'protein go annotation',
 'original',
 'processed']
[10]:
proteomics_dataset.get_dataframe('processed').head()
[10]:
identifier group sample subject A2M~P01023 A30~A2MYE2 ABI3BP~Q7Z7G0 ACE~P12821 ACTB~P60709 ACTN1~P12814 ADA2~Q9NZK5 ... VCAM1~P19320 VCL~P18206 VH6DJ~A2N0T4 VIM~P08670 VK3~A2N2F4 VNN1~O95497 VTN~P04004 VWF~P04275 YWHAZ~P63104 scFv~Q65ZC9
0 Cirrhosis 69_F8 69 38.005564 28.173504 21.631230 22.251041 27.090330 25.039968 23.442151 ... 26.016356 26.337731 31.159485 24.178889 25.835908 22.480055 32.815815 28.922779 22.347244 27.788928
1 Cirrhosis 70_F9 70 37.309118 27.981907 27.342062 23.847270 27.461155 25.896268 23.754503 ... 27.343842 25.535996 31.994997 23.709777 25.004889 23.852908 32.722121 29.881279 22.141285 26.869972
2 Cirrhosis 71_F10 71 37.384952 28.857627 21.080035 22.863630 27.929764 24.295225 23.359443 ... 26.353869 25.858635 30.139559 23.599064 26.271650 24.232132 32.755752 29.444625 21.972598 28.069328
3 Cirrhosis 72_F11 72 38.417225 28.978380 25.501910 22.992774 27.152479 25.231288 23.701340 ... 26.959475 26.531017 31.977294 24.179076 25.929200 24.269047 32.714014 29.397176 22.216971 28.170209
4 Cirrhosis 73_F12 73 37.471303 28.748744 20.200498 21.326143 27.537048 22.392992 22.406264 ... 26.473269 26.355535 30.485582 23.865224 26.701340 20.953141 32.722691 28.540895 18.630532 28.612280

5 rows × 512 columns

Processing of Proteomics Dataset

To show how to go from the original data to the processed dataframe, we will show what function is used and how the parameters are defined:

  • df: long-format pandas dataframe with columns ‘group’, ‘sample’, ‘subject’, ‘identifier’ (protein), ‘name’ (gene) and ‘LFQ_intensity’.

  • index_cols: column labels to be be kept as index identifiers.

  • drop_cols: column labels to be dropped from the dataframe.

  • group: column label containing group identifiers.

  • identifier: column label containing feature identifiers (i.e protein identifiers).

  • extra_identifier: column label containing additional protein identifiers (e.g. gene names).

  • filter_samples: if True filter samples with valid values below percentage (filter_samples_percent).

  • filter_samples_percent: defines the maximum percentage of missing values allowed in a sample.

  • imputation: if True performs imputation of missing values.

  • imputation_method: method for missing values imputation (‘KNN’, ‘distribuition’, or ‘mixed’)

  • missing_method: defines which expression rows are counted to determine if a column has enough valid values to survive the filtering process.

  • missing_per_group: if True filter proteins based on valid values per group; if False filter across all samples.

  • missing_max: maximum ratio of missing/valid values to be filtered.

  • min_valid: minimum number of valid values to be filtered.

  • value_col: column label containing expression values.

  • shift: when using distribution imputation, the down-shift

  • nstd: when using distribution imputation, the width of the distribution

  • knn_cutoff: when using KNN imputation, the minimum percentage of valid values for which to use KNN imputation (i.e. 0.6 -> if 60% valid values use KNN, otherwise MinProb)

  • normalize: whether or not to normalize the data

  • normalization_method: method to be used to normalize the data (‘median’, ‘quantile’, ‘linear’, ‘zscore’, ‘median_polish’) (only with normalize=True)

  • normalize_group: normalize per group or not (only with normalize=True)

  • normalize_by: whether the normalization should be done by ‘features’ (columns) or ‘samples’ (rows) (only with normalize=True)

[11]:
original_data = proteomics_dataset.get_dataframe('original')
original_data.head()
[11]:
LFQ_intensity batch group identifier name sample subject
0 21.593090 None NAFLD+T2DM M0R009 A1BG 63_F2 63
1 37.316049 None Cirrhosis P01023 A2M 77_G4 77
2 37.309118 None Cirrhosis P01023 A2M 70_F9 70
3 38.005564 None Cirrhosis P01023 A2M 69_F8 69
4 37.957887 None Cirrhosis P01023 A2M 76_G3 76
[12]:
processed_data = analytics.get_proteomics_measurements_ready(df=original_data, index_cols=['subject', 'sample', 'group'],
                                                             imputation=True,
                                                             imputation_method="distribution", missing_method="percentage",
                                                             extra_identifier="name",
                                                             filter_samples=False,
                                                             missing_per_group=True, missing_max=0.3,
                                                             shift=1.8, nstd=0.3,
                                                             value_col='LFQ_intensity')
[13]:
processed_data.head()
[13]:
identifier subject sample group A2M~P01023 A30~A2MYE2 ABI3BP~Q7Z7G0 ACE~P12821 ACTB~P60709 ACTN1~P12814 ADA2~Q9NZK5 ... VCAM1~P19320 VCL~P18206 VH6DJ~A2N0T4 VIM~P08670 VK3~A2N2F4 VNN1~O95497 VTN~P04004 VWF~P04275 YWHAZ~P63104 scFv~Q65ZC9
0 31 31_C6 Healthy 37.172267 27.313458 25.233156 21.729536 27.979074 24.172188 23.070420 ... 25.636164 26.854242 31.263381 24.123099 26.241109 23.096500 32.661954 27.711616 21.272205 28.179259
1 32 32_C7 Healthy 36.897240 28.550101 25.251670 20.671384 26.688458 24.518693 23.557155 ... 25.364461 26.409456 31.127802 24.090713 25.906809 23.449222 32.627384 28.778689 22.492604 29.175028
2 33 33_C8 Healthy 37.253761 28.393359 24.360115 19.435843 27.327060 24.788959 22.813820 ... 25.679386 27.330393 30.133080 23.291947 26.596194 22.319916 32.676529 28.839721 22.297905 28.177502
3 34 34_C9 Healthy 37.101435 27.986905 25.613204 22.740147 27.323972 24.928996 20.730150 ... 25.824324 26.238760 30.460023 23.684959 25.682791 23.034923 32.625970 28.588816 22.040461 28.077144
4 35 35_C10 Healthy 37.169563 28.806458 26.438967 23.431104 26.683380 25.832032 22.667416 ... 26.109427 26.766229 30.174624 24.099296 27.743296 24.107891 32.652091 28.482200 22.173561 28.387800

5 rows × 512 columns

Generate Report

Once the data for all the different data types has been loaded we can proceed with the statistical analysis and visualization of the results. This is what we define in CKG as generating a Report for each dataset.

To generate these reports, we make use of the functionality in the analytics core. The automated analysis uses the generate_report() function, which uses the configuration in report_manager/config to run the sequence of analysis defined for each dataset (clinical, proteomics). The code would be something like this:

project_report = p.generate_project_info_report()
p.update_report({"Project information": project_report})
for dataset_type in p.data_types:
    dataset = p.get_dataset(dataset_type)
    if dataset is not None:
        dataset.generate_report()

We will however run some of the analyses to showcase how these steps are done and can be easily modified using the available parameters.

Principal Component Analysis (PCA)

[14]:
pca_result, args = analytics.run_pca(processed_data, drop_cols=['sample', 'subject'], group='group')
[15]:
args.update({"loadings":15, "title":'PCA plot groups', 'height':600, 'width':700, 'factor':15})
plot = viz.get_pca_plot(pca_result, identifier='pca', args=args)
iplot(plot.figure)

Functional PCA - single sample Gene Set Enrichment Analysis (ssGSEA)

We will use the Gene Ontology annotations already extracted when creating the proteomics dataset (dataframe: protein go annotation).

[16]:
annotation = proteomics_dataset.get_dataframe('protein go annotation')
annotation.head()
[16]:
annotation identifier source
0 mitochondrial genome maintenance TYMP~P19971 UniProt
1 maltose metabolic process MGAM~O43451 UniProt
2 maltose metabolic process GAA~P10253 UniProt
3 ribosomal large subunit assembly RPL11~P62913 UniProt
4 ribosomal large subunit assembly RPLP0~P05388 UniProt
[17]:
ssgsea_result = analytics.run_ssgsea(data=processed_data, annotation=annotation, annotation_col='annotation',
                                     identifier_col='identifier', set_index=['group', 'sample','subject'],
                                     outdir=None, min_size=10, scale=False, permutations=0)
[18]:
pca_result, args = analytics.run_pca(data=ssgsea_result['nes'], drop_cols=['sample', 'subject'], group='group')
[19]:
args.update({"loadings":15, "title":'Functional PCA plot groups', 'height':600, 'width':700, 'factor':0.3})
plot = viz.get_pca_plot(data=pca_result, identifier='pca', args=args)
iplot(plot.figure)

Differential Regulation

[20]:
anova_result = analytics.run_anova(df=processed_data, alpha=0.05,
                                   drop_cols=['sample', 'subject'], subject='subject',
                                   group='group', correction='fdr_bh', is_logged=True)
[21]:
anova_result.head()
[21]:
identifier group1 group2 mean(group1) std(group1) mean(group2) std(group2) posthoc Paired posthoc Parametric posthoc T-Statistics ... FC efftype F-statistics pvalue padj correction rejected -log10 pvalue Method posthoc padj
0 A2M~P01023 Cirrhosis Healthy 37.813581 0.412603 37.119625 0.227292 False True 4.658543 ... 1.617713 hedges 5.019891 0.002073 0.037675 FDR correction BH True 3.709005 One-way anova 0.016579
1 A2M~P01023 Cirrhosis NAFLD+NGT 37.813581 0.412603 37.141632 0.263616 False True 4.339807 ... 1.593223 hedges 5.019891 0.002073 0.037675 FDR correction BH True 3.403735 One-way anova 0.025276
2 A2M~P01023 Cirrhosis NAFLD+T2DM 37.813581 0.412603 37.435432 0.592818 False True 1.655627 ... 1.299673 hedges 5.019891 0.002073 0.037675 FDR correction BH True 0.938823 One-way anova 0.444088
3 A2M~P01023 Cirrhosis T2DM 37.813581 0.412603 37.280256 0.398108 False True 2.778815 ... 1.447261 hedges 5.019891 0.002073 0.037675 FDR correction BH True 1.860078 One-way anova 0.112973
4 A2M~P01023 Healthy NAFLD+NGT 37.119625 0.227292 37.141632 0.263616 False True -0.199939 ... 0.984861 hedges 5.019891 0.002073 0.037675 FDR correction BH True 0.073776 One-way anova 0.972450

5 rows × 26 columns

[22]:
args={'alpha':0.05,
      'fc':2,
      'colorscale':'Blues',
      'showscale': False,
      'marker_size':10,
      'num_annotations':480,
      'x_title':'log2FC',
      'y_title':'-log10(pvalue)'}
figures = viz.run_volcano(anova_result, identifier='volcano', args=args)
for figure in figures:
    iplot(figure.figure)

Correlation Analysis

[23]:
correlation_result = analytics.run_correlation(processed_data, alpha=0.05,
                                               subject='subject', group='group',
                                               method='spearman', correction='fdr_bh')
[24]:
correlation_result.head()
[24]:
node1 node2 weight pvalue padj rejected
1 A30~A2MYE2 A2M~P01023 0.314264 0.0 0.0 True
4 ABI3BP~Q7Z7G0 A30~A2MYE2 -0.332827 0.0 0.0 True
8 ACE~P12821 ABI3BP~Q7Z7G0 0.056882 0.0 0.0 True
10 ACTB~P60709 A2M~P01023 -0.033977 0.0 0.0 True
11 ACTB~P60709 A30~A2MYE2 -0.104429 0.00242 0.0057 True
[37]:
network = viz.get_network(correlation_result, identifier="Correlation network",
                         args={'source':'node1', 'target':'node2',
                               'title':'Correlation network', 'values':'weight',
                               'cutoff':0.5, 'cutoff_abs':True, 'color_weight': True,
                               'communities_algorithm': 'louvain'})
[38]:
viz.visualize_notebook_network(network['notebook'], notebook_type='jupyter', layout={})

Functional Enrichment

[27]:
enrichment = analytics.run_up_down_regulation_enrichment(anova_result, annotation,
                                                         identifier='identifier', groups=['group1', 'group2'],
                                                         annotation_col='annotation', reject_col='rejected',
                                                         group_col='group', method='fisher',
                                                         correction='fdr_bh', alpha=0.05, lfc_cutoff=1)
C:\Users\sande\.conda\envs\ckgenv\lib\site-packages\pandas\core\frame.py:6692: FutureWarning:

Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.


[28]:
figures = viz.get_enrichment_plots(enrichment, identifier='enrichment', args={'width':2200})
for fig in figures:
    iplot(fig.figure)

Knowledge from CKG

An option to retrieve relevant knowledge in CKG for the list of significant hits would be to use the functionality annotate_list(), which extracts information from the knowledge graph related to the provided list of proteins (and other entities as well). This result is mildly different results to the automatically generated knowledge report, which also includes results from the analysis of the clinical variables as well as results from the combination of both datasets.

For this, we need to create a Knowledge object and provide the list of proteins we are interested in. Further, the function gives the posibility to provide a list of diseases relevant to the study.

[29]:
kn = knowledge.Knowledge(identifier='NFLD', data=None)
[30]:
sig_hits = list(set(anova_result.loc[anova_result.rejected, "identifier"]))
print(sig_hits)
['HBG2~P69892', 'LGALS3BP~Q08380', 'ALDOB~P05062', 'APOM~O95445', 'PIGR~P01833', 'VTN~P04004', 'TTR~P02766', 'QSOX1~O00391', 'None~A8K1K1', 'PROC~P04070', 'A2M~P01023', 'IGHM~P01871', 'RBP4~P02753', 'LYVE1~Q9Y5Y7', 'ITIH1~P19827', 'V2-13~Q5NV73', 'C1QB~P02746', 'CPN2~P22792', 'IGFBP3~P17936', 'None~A0A120HG46', 'AFM~P43652', 'JCHAIN~P01591', 'ALDH1A1~P00352', 'CLU~P10909', 'VCAM1~P19320', 'IGH@~Q6GMX6', 'COLEC11~Q9BWP8', 'C3~P01024', 'IGFALS~P35858', 'SHBG~P04278', 'GP1BA~P07359', 'CPB2~Q96IY4', 'C6~P13671', 'C7~P10643', 'IGHV5-51~A0A0C4DH38', 'TGFBI~Q15582']
[31]:
kn.annotate_list(query_list=sig_hits,
                 entity_type='protein',
                 queries_file=None,
                 attribute=None,
                 diseases=['cirrhosis', 'non-alcoholic fatty liver disease', 'type 2 diabetes mellitus'],
                 entities=None)
[32]:
kn.graph
[33]:
kn.generate_report(visualizations=['network'], # how to visualize the results (network, sankey)
                   summarize=True, # Whether or not to summarize the annotation
                   method='pagerank', # Method for summarizing the annotation (betweenness, closeness, pagerank)
                   inplace=True) # If True, the summarized graph is saved, otherwise keep full graph
[34]:
kn.report.visualize_report(environment='notebook')[0]
[ ]: